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Abstract —We present a novel compressed sensing recovery 
algorithm - termed Bayesian Optimal Structured Signal Approx¬ 
imate Message Passing (BOSSAMP) - that jointly exploits the 
prior distribution and the structured sparsity of a signal that 
shall he recovered from noisy linear measurements. Structured 
sparsity is inherent to group sparse and jointly sparse signals. Our 
algorithm is based on approximate message passing that poses 
a low complexity recovery algorithm whose Bayesian optimal 
version allows to specify a prior distribution for each signal 
component. We utilize this feature in order to establish an 
iteration-wise extrinsic group update step, in which likelihood 
ratios of neighboring group elements provide soft information 
about a specific group element. Doing so, the recovery of 
structured signals is drastically improved. 

We derive the extrinsic group update step for a sparse binary 
and a sparse Gaussian signal prior, where the nonzero entries are 
either one or Gaussian distributed, respectively. We also explain 
how BOSSAMP is applicable to arbitrary sparse signals. 

Simulations demonstrate that our approach exhibits superior 
performance compared to the current state of the art, while it 
retains a simple iterative Implementation with low computational 
complexity. 

Index Terms —compressed sensing, message passing, group 
sparse, jointly sparse, sparse binary, Bernoulli-Gaussian, Gaus¬ 
sian mixture, extrinsic Information, turbo decoding 

I. Introduction 

Solving a linear system of equations y = Ax for x is 
an omnipresent problem in various fields, as a myriad of 
problem statements can be written in such form. While the 
very classical techniques such as least squares or Minimum 
Mean Squared Error (MMSE) estimation minimize the error 
with respect to the £ 2 -norm, the incorporation of additional 
knowledge like sparsity and structure of x has gained a lot of 
attention over recent years. 

In particular, compressed sensing was introduced in ifTl-ll^ 
to solve a linear system of equations in case of a sparsely 
populated x, i.e., the vector features only few nonzero entries. 
A change of paradigm was triggered by recognizing that a 
sparse x can be reconstructed perfectly from an underdeter¬ 
mined system of linear equations. In the context of signal 
processing, a signal vector x can thus be reconstructed from 
undersampling, where the sampling basis functions are the 
columns of A, and where the samples are stored in y. This 
led to a huge popularity in the search for efficient recovery 
algorithms that incorporate the sparsity constraint; the classical 
compressed sensing formulation 

X = arg min llxIL s.t. Ax = y, 

xGR^ 

where the pseudo norm ||x||p counts the number of nonzero en¬ 
tries in X, leads to a combinatorial search for x and is generally 


NP-hard to solve. The problem was relaxed into an £i-norm 
minimization called basis pursuit (denoising) which is also 
applicable to noisy samples y. An alternative formulation that 
introduces a controllable weight for the sparsity constraint is 
given by the Least Absolute Shrinkage and Selection Operator 
(LASSO) Ha. A computationally efficient recovery algorithm 
that iteratively solves the LASSO is provided by Approximate 
Message Passing (AMP) that was introduced in la-lil. Its 
foundation is Gaussian loopy belief propagation nni with 
simplified message passing that assumes high dimensional 
signal vectors x. Its Bayesian optimal version that exploits 
the signal prior is described in Q-iii, we will henceforth 
denote it as Bayesian optimal Approximate Message Passing 
(BAMP). BAMP was further extended in UTl in order to allow 
for non-linear and non-Gaussian output relations; the samples 
y are transformed by an arbitrary but known function, and the 
output of the transformation is known to the estimator. 

In many problems, x features a certain structure in the 
sparsity, i.e., collections of entries (groups) contain either only 
zeros or only nonzero entries. Group sparsity typically occurs 
in (image) classification tasks ca-iiii. To account for this, 
the LASSO was extended to the group LASSO ifTsl - lfTsl . 
Another established scheme is grouped orthogonal matching 
pursuit ESI. In the realm of AMP, the generalized approach 
M was extended in ll20l to incorporate the signal structure. 

Aside from group sparsity, structure is also found in jointly 
sparse signals, i.e., Nb vectors Xf, share a common sup¬ 
port (the nonzero entries occur at the same indices, V6 S 
{1,..., As}). A prominent instance of this is the multiple 
measurement vector problem El, El- Typical applications 
of the jointly sparse case are neuromagnetic imaging El, El 
and direction-of-arrival estimation ll24l . 

A. Contributions 

In this paper, we present a novel recovery algorithm - 
termed Bayesian Optimal Structured Signal Approximate Mes¬ 
sage Passing (BOSSAMP) - that extends BAMP to incorporate 
the signal structure, such as group or joint sparsity. The 
approach also allows for overlapping groups, and it is not 
restricted to the sparse case. The key feature is the inclusion of 
a group update step that is inspired by an extrinsic information 
exchange that is predominantly used in coding Il25l - ll28l . 
where it is also known as the turbo principle. In each iteration 
of BOSSAMP, the probability that a specific group entry was 
zero is updated by accumulating the extrinsic information 
of all other group entries in terms of likelihood ratios. This 
leads to a superior recovery performance compared to other 


state of the art approaches such as ini-ia, which we show 
by simulation. Furthermore, our algorithm converges to a 
solution in very few iterations, while its implementation stays 
simple and efficient. Specifically, it only requires two matrix- 
vector multiplications per iteration, matrix inversions are not 
required. 

B. Related Work and Novelty 

Merging (loopy) belief propagation with turbo equalization 
to recover structured sparse signals has already been suggested 
in ||2^ . where the factor graph of the observation structure was 
extended by a pattern structure. Hidden binary indicators were 
used to model whether signal entries are active (nonzero) or 
inactive (zero). Sparsity pattern beliefs are exchanged between 
the observation structure and the sparsity structure in an 
iterative manner by leveraging (loopy) belief propagation. This 
approach was later utilized in compressive imaging ll30l to ex¬ 
ploit the sparsity and persistence accross scales of 2D wavelet 
coefficients of natural images. A generalized manifestation of 
AMP that embeds the ideas of ll29l and presents an algorithmic 
implementation is provided in ll20l . to which we compare our 
scheme to. 

While 1291 . lIMl approach the topic from the message 
passing point of view, we focus on an alternative description 
that utilizes likelihood ratios in terms of L-values. We utilize 
the classic BAMP algorithm and extend the iteration loop by 
two steps, namely the group update and the subsequent prior 
update. On the one hand, BAMP assumes independently but 
non-identically distributed signal entries in x to perform scalar 
MMSE estimation. On the other hand, our two additional steps 
update the entry-wise prior information for the next BAMP 
iteration by exploiting the structure in the sparsity. In a ’’ping- 
pong” manner, an MMSE estimate emerges. 

We give a detailed and easy-to-follow derivation of the 
group update step, specifically for the following two prominent 
cases, where we provide simple closed form expressions: first 
for the sparse binary case, where L-values can be formulated 
naturally, and then for the sparse Gaussian (also known as 
Bernoulli-Gaussian) case, where we introduce a latent binary 
variable that indicates whether an entry was zero or nonzero. 

While the resulting BOSSAMP algorithm shares similarities 
with the Hybrid Generalized Approximate Message Passing 
(HGAMP) algorithm ll20]| which is applicable to a more 
general class of problems, our approach sticks to the standard 
AMP framework l6l-||9l which is mainly applicable to samples 
that are corrupted by Gaussian noise. While being restricted to 
a smaller class of problems, this leads to a simpler implemen¬ 
tation and a higher comprehensibility. Eurthermore, simulation 
results suggest that our approach outperforms HGAMP in 
terms of recovery performance and phase transitions. 

C. Notation 


an M X iV matrix is denoted A(:) = [A^^,..., The 

N X N identity matrix is denoted I^r. The length N all- 
one vector is denoted Ij^, while the N x N all-one matrix 
is denoted IjsixN- Similarly, we define the all-zero vector 
Oat and the all-zero matrix Oatxw- Calligraphic letters S 
denote sets, their usage as subscript implies that only the 
vector entries defined by the elements in S are selected. The 
cardinality of a set is denoted by |iS|. Random variables and 
vectors are denoted by sans serif font as x and x, respectively. 
While X ^ J\f{p,,a^) denotes a Gaussian distributed random 
variable x with mean p, and variance cr^, the shorthand notation 

denotes that such a Gaussian distribution is evaluated at the 
value X. 


D. Outline 

The re mainder of this paper is outlined as follows: 


Section III reviews compressed sensing and draws the link 
between the probabilistic graphical model of the estimation 
and the iterative recovery schemes AMP and BAMP, the 
algorithmic implementation of these is also presented. Finally, 
two prominent sparse signal priors, namely the sp arse binary 
and the sparse Gaussian prior, are introduced. ISection IIll 
describes the group sparse case and presents the correspond¬ 
ing BOSSAMP algorithm, whose update rules are derived 
for the spar se binary a nd the sparse Gaussian signal prior, 
respe ctively. Section IVl does the same for the jointly sparse 


case. 


Section V discusses the extension of BOSSAMP to the 


arbitrary signal case. ISection VIl introduces the figures of merit 
and the comparative schemes for simulation, and then presents 
and discusse s the numerical results. The paper is concluded in 
Section VII. 


II. Recovery oe Sparse Signals 
A. Compressed Sensing 

Compressed sensing was introduced in Q-El to recon¬ 
struct a high-dimensional signal vector x G from M < N 
linear measurements 


y = Ax -k w, (2) 

where y G M.^ is the measurement vector, A G is the 

fixed sensing matrix and w G is additive measurement 
noise. Signal vector x is assumed to be AT-sparse: it has at 
most K out of N no nzer o entries, where K N. This enables 
to recover x from although the system of equations is 
underdetermined. To ensure stable recovery from noisy mea¬ 
surements, the sensing matrix A has to satisfy the Restricted 
Isometry Property (RIP) IS 

(l-^2ic)||v||^<||Av||^<(l + <52A)||v||^ (3) 


Boldface letters such as A and a denote matrices and 
vectors, respectively. Considering matrix A, A^ ^ is its i-th 
row, while A.j is its j-th column. The superscript denotes 
the transposition of a matrix or vector. The vectorization of 


of order 2K and level 62K G (0,1) for all 2Ar-sparse vectors 
V, which basically implies that the linear operator A preserves 
the Euclidean distance between every pair of A-sparse vectors 
up to a small constant 62 k- 

















An appropriate sensing matrix can be constructed by picking 
A randomly with i.i.d. (sub-)Gaussian entries. Such matrices 
were proven in lISTI to almost surely satisfy the RIP while the 
number of required measurements for successful recovery is 
lower bounded by 


M = 


N' 

cK log — 


(4) 


where c is a small constant and the ceiling operation [•] 
ensures an integer number of measurements. 


B. Graphical Model 

A graphical model poses the probabilistic foundation of 
compressed sensing recovery algorithms that are based on 
message passing, such as AMP and BAMP, see 13^ . The 
key ingredient is a factorization of a multivariate distribution 
- in our case the posterior distribution of x given y based on 
1(211 - into many factors. This factorization is described and 
visualized by a factor graph ||34| . which we will now 
present. 

Let us beg in w ith the underlying probabilistic assumptions. 
Considering l(2ll we assume that only sensing matrix A 
is deterministic (fixed), which leaves us to characterize the 
distributions of measurement vector y = [j/i,..., ym, ■■■, 
signal vector x = [a;i,..., a;„,..., and noise vector 

W = 

We assume white Gaussian noise with zero mean and co- 
variance i-e., w ^ Af{0, The noise Probability 

Density Function (PDF) calculates as 

M M 

.fwCw) = /w(Wm) = A/'(W™|0, Cr^). (5) 

m—1 m—1 

The joint PDF of signal and measurement can be factored 
according to Bayes’ rule: 

/x.y(x,y) = /x|y(x|y) /y(y) = /y|x(y|x)^ ^(:!^. (6) 

posterior likelihood prior 

We assume independently distributed signal entries with PDF 
/x„(a^n), the prior, thus, factors as 

N 

/x(x) = n/x„(a;„). (7) 

n—1 

The likelihood is characterized by the noise PDF: 

M 

/y|x(y|x) = /w(y - Ax) = n ~ Arn,:X) . (8) 

m—1 

Estimators of x typically rely on the posterior 

M 

/x|y(x|y) = n -n —T/ym|x(2/m|x)/x(x) (9) 

m=i -ty”* ly™-' 
that entails M factors. 

The resulting factor graph FG = {y,T,E) consists of the 
variable nodes V = A} that encompass the sign al of 

interest x, the factor nodes iF = {1,..., M} associated to [mi 


Factor Graph FG = (V, S) 



Fig. 1 . Factor graph of measurement El] associated to EU 


and the edges E = {{m, n) : m G F,n G V} that correspond 
to the relations between x and y which are dictated by the 
entries of A, i.e., A^ „. Since typically, all entries o f A are 
nonzero, our bipartite graph, as illustrated by Figure ll is fully 
connected. 

As we intend to recover x from y knowing A, we formulate 
the MMSE estimator 


XMMSE(y) = Ex [x|y = y] = / x/^|y(x|y)dx. (10) 

jR'y 

This task can be approximateljQ solved utilizing message 
passing (belief propagation) and the sum-product algorithm 
ll^ - ll35l on our factor graph. However, a computationally 
efficient method is only obtained after a series of assumptions 
and approximations - described in ||9l - that yield the AMP 
algorithm. Note that AMP performs scalar MMSE estimation 
independently for every component of x. 


C. Approximate Message Passing (AMP) 

AMP has been introduced in iSl-llH to efficiently solve the 
LASSO problem Q, also known as basis pursuit denoising 
a, that constitutes a non-linear convex optimization problem 


XLASSo(y; A) = arginin 

X 


1 

2 


l|y- 


Ax||2 + A||x||^ 


( 11 ) 


The underlying intuition is to find the most accurate solution 
with the smallest support (motivated by the assumed sparsity 
of x), where A allows for a trade-off between accuracy 
with reject to the ^2 observation error ||y — AxU^ and the 
sparsitjo ||x||]^ of the solution. 

An illu strati ve approach to obtain the LASSO from MMSE 
(10')l is to assume that the signal vector entries are 


estimator 


Laplacian distributed, i.e., fx„{xn) = /uapiaceCa;™; 0, k) with 
fhapiacsix] y, k) = T gxp ^ |a; -/r|^ . (12) 


The zero mean Laplace distribution poses a sparsity enforcing 
prior, i.e., its probability mass is co ncen trated around zero. 
Plugging the La place signal prior intoji^ and calcu lating the 
MMSE estimate IdOl we obtain the LASSO (1 111 with A = 
alln. 


^The considered graph typically contains cycles which lead to loopy belief 
propagation that yields an approximate result. 

^Sparsity is usually expressed by the ^0"”norm” according to ||x||q < K. 
The £i-norm relaxation was proven in (2 to very often yield the same result in 
high dimensions, while introducing a favorable convex optimization problem. 
































In AMP, A is a design parameter. For the optimal choice 
of A, it was shown in ll^ that the fixed point of the AMP 
solution conicides with the LASSO solution in the asymptotic 
regime where M/N = const, while N, M ^ oo. 

As discussed in JH, 1321, AMP d eco uples the estimation 
problem associated to measurement l{2j into N uncoupled 
scalar problems in the asymptotic regime; 


. asympt. 

y = Ax + w->■ 


Ui = Xi + Wi 


UN = Xn + Wn 


(13) 


where the effective noise asymptotically obeys w„ ~ A/"(0, /3). 
Note that it is GaussiarH, and j3 > Revisiting the LASSO 
problem in the scalar case, i.e.. 


a?LASSo(M; A) = argmjn {u- xf + X\x\ 
it is known that the soft thresholding function 

U + T if U < —T 

p(m; t) = < 0 if—r<u<r 
u — T \f u> T 


(14) 


(15) 


admits a (possibly optimal) solution to 1(141 see 0. The soft 
thresholding function acts as a denoiser, i.e., it sets values 
below a certain threshold to zero. This function is also found 
in the AMP algorithm - our implementation is stated by 
Algorithm [I] - where it is applied entry-wise on the decoupled 
measurements x + A"'’r, i.e., on Un = Xn + (A:^„)"'"r. The 
iterations are stopped once the change in the estimated signal is 
below a certain threshold - controlled by Ctoi - or the maximal 
number of iterations is reached. For a detailed derivation 
of AMP, the interested reader is refered to 0-191. 


D. Bayesian optimal Approximate Message Passing (BAMP) 

While the standard AMP algorithm implicitly assumes a 
(sparsity enforcing) Laplacian signal prior, its Bayesian opti¬ 
mal version BAMP allows to specify arbitrary signal priors 
fx^{xn), individually for each entry of the signal vector — 
this is a key feature that will be exploited by the proposed 
algorithms for structured sparsity. As before, we will stick to 
the main features and refer to Q-© for details. 

BAMP builds on the same decoupling principle OH, El 
as AMP, which is valid in the asymptotic regime and ap¬ 
proximately satisfied in fin ite d imensions. Let us discuss the 
decoupled scalar problem EM Un = Xn + Wn, where we 
know that w„ ^ Af{0,f3). The (entry-wise) posterior of the 
decoupled problem thus reads, using Bayes’ theorem, 

/xn|Un “ ~n /u,i |Xn I ) /xn 1 (16) 

Jun{Un) 


Algorithm 1 AMP 

1 

initialize x* = 0 and r* = 

y for f = 0 

2 

do 


3 

t = t + l 

[> advance iterations 

4 


[> compute threshold 

5 

x‘ = 77(x* 1 -f A'^r* 

; r) > soft thresholding 

6 


> compute sparsity 

7 

r* _ y _ _|_ 5j.t 1 

> compute residual 

8 

while X* - 2 > Stoi 

x‘“^ 2 t < fmax 

9 

return x = x* 

[> recovered sparse vector 


Algorithm 2 BAMP 

1 

initialize x* = 0 and r* = y 

for f = 0 

2 

do 


3 

t = t + l 


4 

u‘-i =x‘-i+ATr*-i 

> decoupled measurements 

5 


> effective noise estimate 

6 

X* = F(u*-i;/3‘-i) 

[> estimate signal 

7 

r* =y-Ax* + r‘-i^ 


8 

while X* - x‘ ^ 2 > Stoi 

x‘ 1 2 t < fmax 

9 

return x = x* 



BAMP utilizes the following functions 171-191: 


^(Un, P'} — ®lx,i I Lin — Un, /?}, 

G{Un, /3) — ^Lirx,^ {Xn | Un — Un, /?}, 


F'{i 


;/3) = —F(un;/3). 

dUn 


(17) 

(18) 

(19) 


The conditional expectation EM yields the scalar MMSE 
estimate of Xn given the decoupled measurement Un- Our 
impl ementation of BAMP is stated by Algorithm |2. Functi on 
(17)1 is applied entry-wise on a vecto r arg ument, IdSlI is 


typically an intermediate step to computed 191 Note that for a 
specified signal prior, functions I(17l4(19l admit closed form 
expressions. We will now specify those for a sparse binary 
and a sparse Gaussian signal prior, respectively. 

Sparse Binary Signal Prior: for Xn G {0,1}, the prior reads 


fx„{Xn) = 7nd(Xn) + (1 “ 7n)d(Xn - 1), 


( 20 ) 


where jn indicates th e probabili ty of Xn being a zero. In 
this setting, functions (1711 - (19ll boil down to 


F{Un-,ff7n) = 


l + exp(l^+log^)’ 
GiUn, (d, 7n) — F{Un, /?, F{Un, /3, ^n) , 

F {Un‘,ff7n) — y^G{Un', Id ,7n) • 


( 21 ) 

( 22 ) 

(23) 


where /j„(w„) — f'ir,\>^S'^n\xn)fxn{xn)dxn and Sparse Gaussian Signal Prior: for Xn € {0, A/’(0, CTx^)}, 

fun\xAun\xn) = A/'(u„|x„,/3) . Instead of soft thresholding, the prior reads 


^This is an assumption that is satisfied in the asymptotic regime. 


fxSxn) = 7„5(x„)-f(l-7„)A/'(a:„|0, cr^^), (24) 

































where 7 „ indicates the probability of Xn being a zero. 
In literature, this case is also known as the Bernoulli- 
Gaussian case. Functions 1(17)14(19)1 calculate as 


F{Un,l3,J, 

F'{u. 


= PM{Unnn,q) + 'm{Un,-fn,q), 

1 


— pG(Un] I3,^n)i 


with q = and the auxiliary functions 


M{Un,Jn,q) = 


q 


1 


1 + g 1 + m{Un,^n, q) ’ 


(25) 

(26) 

(27) 


(28) 


L(x„) = log 


P(x„ = 0 ) 


= log- 


In 


(32) 


P(x„ = 1 ) 1 - 7„ 

A large positive value indicates a high probability of being 
a zero, a large negative value a high probability of being 

‘^L-values are log likelihood ratios in the context of coding. They are 
typically used in soft-input channel decoding or in iterative decoding. 



f \ 3'”- /T~r~ ( q \ 

mK.7„,5) = 5 . 3 ^ 71 +^exp ■ 

(29) 

If the number of nonzero entries K is known a priori, we 
choose 

7n = l-^,VneV = {l,...,iV}. (30) 

In case uf unknown sparsity, one has to assume a certain 
sparsity and plug in an estimate for K. 

III. Recovery oe Group Sparse Signals 

In the group sparse case, signal vector x is partitioned into 
Nq groups such that the groups partition (non-overlapping 
case) the total support set V = {1, ...,iV}: 

No 

(31) 

s=i 

where Qg contains the signal vector indices that correspond 
to group g. In case of overlapping groups, the intersection of 
two different groups may contain elements. The signal vector 
entries that correspond to a group are either all zero or all 
nonzero — knowing the groups reflects the a priori knowledge 
of the signal structure. An exemplary group assignment on a 
factor graph is depicted in [Figure 2i While the groups may 
vary in their size \Gg\, the signal vector x is assumed to be 
iT-sparse, which typically implies that the number of nonzero 
groups is small. 

Consid ering BAMP in the sparse signal case, the priors (20)1 
and (24)1 allow to modify the probability that a signal entry Xn 
is zero, for each entry of x individually. This is the key feature 
exploited by Bayesian Optimal Structured Signal Approximate 
Message Passing (BOSSAMP). 

A. BOSSAMP and Group Sparse Binary Signals 

Binary signals allow for a convenient computation of soft 
information in terms of L-value^ ||25]| . Il26ll : 


a one. If we consider the decou pled measurements (131 the 
conditional L-values read, using l( 2 ll 

F{^ri ~ 0|t*n ~ tin) 


L(X„|U„ = Un) = log 
== log 


- l|Uj 7 - ttyi) 

1 F(Un^ Pi 7n) 


F(Un] Pi 7n) 

1 27/r), , 7 n 


(33) 


log- 


2/3 ' 1 - 7 „ 

If we compute the conditional L-values in each iteration of 
BAMP, i.e., compute the likelihood of being a zero for each 
signal entry estimate, we obt ain fo r entry Xn at iteration t (cf. 
line 6 of Algorithm |3, using (33)h : 


L{Xn 


l-2ui 


u„ =u„ = 


log- 


Tn 


„t-l ■ 


(34) 


2/3‘-i ■ - 1 - 7 ' 

The key feature of BOSSAMP is to use the L-values 1(34)1 
of iteration t as extrinsic a priori information II26I in the 
subsequent iteration t + 1. To that end, we calculate the L- 
values that accommodate the innovation of the new iteration: 

'-*-1 1 - 2 ul-i 




) - log 


In 


1 -7i 


t -1 


2/3 


t -1 


(35) 


To exploit the group structure, we introduce the binary 
extrinsic group update 

—i / +_i n\ -^0 


L„ = C/G(u‘-i,/3‘-\7°):=i„+ E ^ 

l^Gg\n 


Vn e Qgi'^g G { 1 , ■.■,Ng}i 


In , V- (36) 


2/3 


t-1 


which yields L* = [L*,..., L^]"”". This can be interpreted as 
follows: L„ is the static prior knowledge about the n-th entry, 
and \n is the extrinsic information of the rest of the 

group that contains the innovation of the current iteration. If 
the extrinsic information provides a positive L-value, entry Xn 
becomes more likely to be a zero rather than a one. 

After the extrinsic group update, the signal prior is updated 
accordingly for the subsequent iteration. We therefore intro¬ 
duce the prior update 

1 


Tn = Up{L^) := 


1 -I- exp (- 


(-L) 


, Vn G 12 , 


(37) 























































Algorithm 3 BOSSAMP for Group Sparse Signals 


init. x‘ = Oat, 

do 

t = t + l 
= x‘ 


= y and 7 * = Iat- ^ for f = 0 


x* = i^(u‘ /3* ^,7* ^ 

_ T A I „t-i J_ 

l‘ = C/g(u*-i,/3*-i,70) 


r = y - Ax‘ + r‘-i ^ Ell /3‘-\ 7‘-' 


7‘ = ?^p(L ) 
while ||x* — x*“^| 
return x = x‘ 


> etoi X 


t-i 


where we used L-value definition (32')[ 

By including these two steps in the BAMP algorithm, we 
obtain the BOSSAMP alrarithm for grou p sparse sig nals that 
is outlined in Algorithm — functions 1(2111 (23ll and (361 
are utilized for sparse binary signals. The zero pro babil ities 
are initialized as 7 ° = 1 — ^, Vn G V, according to (30l[ 




(39) 


the marginalization l(391l becomes equivalent to l(381l — we 
successfully reformulated (see ll39l l the Gaussian mixture to 
involve a binary latent variable that can be estimated by 
the E-step of the EM algorithm. The E-step computes the 
probabilities 


) 

> extrinsic group update 
> prior update 
and t < (max 


— l|Un — — 


(l)/u„|z„,i — 1) 

/u„(Un) 

El —1 1 ^j ) 


(41) 


which are called responsibilities', P(z„ ^ = l|u„ = m„) is the 
responsibility of mixture component i for explaining observa¬ 
tion Un- They are used to estimate the zero probabilities 7 ^ 
in each iteration t, which is our E-step. The estimate reads 


In ■= Pi.'^n,! = 1 |U„ = < ^ = 7 ^ 


7rW«-i|0,/3‘-i) 


B. BOSSAMP and Group Sparse Gaussian Signals 

In the sparse Gaussian case, we are not able to express L- 
values directly as in the binary case. We tackle this problem 
by introducing a binary latent random variable inspired by the 
E-step of the Expectation Maximization (EM) algorithm JSS)- 
BOl . This allows us to estimate the zero probabilities 7 ^ in 
each iteration t of BOSSAMP. Consider the p rior d istribution 
of the decoupled measurements Un = Xn+ Wn 1(13ll which can 
be expressed as a Gaussian mixture 1^ . ll40l : 

/lj„('w«) = 7nA/'(Mn|0,/3) -f (1 - 7„)A/'('u„ |0,/3 -f alj 

2 

— ^ ^ \P’i ) ) ■ 

1—1 

(38) 

We distinguish between two Gaussian distributions; 

• Distribution z = 1 is associated to the zero entries in the 
original signal x. The corresponding estimates solely 
contain the effective noise w„ ^ A/'(0, /3) (i.e., Xn = 0). 
We thus set pi = 0 and af = /3. 

• Distribution z = 2 is associated to the nonzeo entries 
where Un contains the noisy signal entries {xn 0 ). 
Therefore, /i 2 = 0 and cr| = /3 -f . 

The mixing coefficients i determine the probability of the 
individual mixture components: is the probability of a 

zero entry, and an ,2 = 1 —ctra.i is the probability of a nonzero 
entry. In order to estimate these probabilities, a latent binary 
random variable z„,i G { 0 , l},z = 1 , 2 , is introduced; 


7rW(zzriO,/3‘-i) + (l-7r')Mwn 10,a2+/3‘-i) 

(42) 

As our latent variable z„ 1 is binary, we can formulate the 
L-values 


L(z„,i|u„ = ujj = log 


log 


P{Zn,l =l\Un= U*n 
P{Zn,l = 0|u„ = U 
% 


t-l\ 


(43) 


l-7n 


that indicate how likely signal entry Xn was to be zer o (imp lies 
Zn.i = 1) given the measurement zzE^- Similar to l(35ll we 
introduce the innovation L-values 


7 ; 


L„ = L(z„,i|u„ = zz„ )-log--^ 

1 - 7n 


= log 


7n” (1-7^) 

Ar«-i|0, 

Ar(uE'| 0 ,a 2 +/3‘-i) 


(44) 


1 , 

= 5 log- 


1 




t -1 


2/3‘-i(/3‘-i+0- 

They are utilized for the Gaussian extrinsic group update 

—i -^-r / +_i n\ -^0 


L„ = C/g(u‘-i,/3‘-i,7°):=L„+ ^ L 

le0g\n 


= log 


1-7J? 




< 




leQg\r 




t-1 




VnG5g,V5G{l,...,WG}, (45) 


where z„.i -l-z „_2 = 1 and /u„|z„ i(un\^n,i = 1) is a Gaussian 
distribution with mean and variance af. Defining the 
Probability Mass Eunction (PME) of z„ ^ as 

P^7i,ii^n,i^ “ (1 ^n,i)^{,^n,i) 4” fXn,iS(,Zn,i 1), (40) 


which is similar to l(36l[ The s ubseq uent prior update is the 
same as in the binary case, see (3711 
The same BOSSAMP algorithm body as stated b y Algo- 
rithm Q is used — note that the functions (251 (27ll and (45ll 
are utilized for sparse Gaussian signals. 

















































Algorithm 4 BOSSAMP for Jointly Sparse Signals 

1 : init. X* = OjvxiVB^ ^^ = ^nxNb-§ and = y{, 
yb G B = {1, Nb} and t = 0 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


do 


t = t + l 

for 5 = 1 to Nb do > BAMP iteration for all blocks 




A( 6 )Tr 




■'b — M 


r‘ = yb-A(^) 


^i + ri-^4.EnP'i 


—t 


L =C/G(U‘-i,/ 3 ‘-i,rO) 


"h M 
->0 


r* = c/p(L) 

while ||X(:)*-X(:)*-i| 
return x = x* 


h;/3b '>7^.6') 

> extrinsic group update 
> prior update 


, > Ctol ^(0 


i-11 


and t < 


V 


V 



A. BOSSAMP and Jointly Sparse Binary Signals 

Compared to the group sparse case, we essentially have to 
extend the indexing from vectors to matrices. In particular, 
r'* = [7ii ■•■i7 /7 b] contains the zero probabilities of iteration 
t, where (r‘)„ {, = 7 ^ f, is the n-th entry of block b. Similarly, 

(U*)b,6 = < and (t)n,b =X.b, and (3^ = [/?f, 

The regular BAMP iteration is executed independently for 
each of the Nb blocks (jointly sparse signals). Once iteration t 
is finished for all block s, a collective binary extrinsic group 
update, similar to ItJbli is executed to exploit the joint support 
structure among the Nb signals; 


L„,, = t/G(U*-\/3‘-\7°.b) ■■=Ln,b+ E 

leB\b 

, 7n,b ^ ' 

= log-;—ni-+ E 


1 _ .^0 

'n,b i^B\b 


3t-l ’ 


2 Pl 

VnGV,V6eS. (49) 


Afterwards, the zero probabilities are up dated for the subse¬ 
quent iteration by executing prior update lOTll which is now 
applied entry-wise on a matrix. 

The BOSSAMP algorithm for the jointly sparse case is 
depicted by Algorithmic. 


B. BOSSAMP and Jointly Sparse Gaussian Signals 

Considering the block structure indexing, the collective 
Gaussian extrinsic group update, similar to llTSll reads 


Fig. 3. Factor graph of two jointly sparse signal vectors xi and X 2 . 


IV. Recovery oe Jointly Sparse Signals 

In the jointly sparse case, we consider Nb signal vectors 
Xf, G , b G B = {1 ,..., Nb}, that share a common support 

S^ = S^„ybGB, (46) 

where contains the indices of the nonzero entries in Xf,. 
In the most general case , the compressive measurements are 
formulated similar to 1 (2)1 as 

yh = A<^*’^Xh-f Wfc, (47) 

where yf, G G and Wf, G . Let us 

collect the data blocks in matrices; Y = [yi, ...,yb, ...,y 7 VB]> 
X = [xi, ...,X6, ...jXatb] and W = [wi, ..., Wb,..., w^Vb]- If 
all sens ing m atrices are equal, i.e., A = A}^\'ib G B, we can 
rewrite 1(47)1 as 

Y = AX + W. (48) 


The joint sparsity is expressed by the rows of matrix X, 
whose entries are either all zero or all nonzero. These rows 
can be interpreted as Nq = N groups, where each group 
Gg,g G Ng, contains \B\ = Nb elements. The BOSSAMP 
algorithm that exploits the joint sparsity is thus very similar 
to the one in the group sparse case. An exemp lary factor graph 
with Nb = 2 blocks is depicted in Figure 3[ 


Eb = C^G(U‘-\/3*-\7:j.b) :=E,b+ E ^-7 

leB\b 
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= log 


7*^ h 

'n,b 


it-l I 2 

I + 


vO ■ 2 


^ In.b Pi 




n,l ) 


l^B\h 


VnGV,V6GS. (50) 


The resultin g L is then used for the subsequent entry-wise 
prior update (37)1 


V. Recovery of Arbitrary Structured Signals 

While the previous sections presented BOSSAMP for 
the prominent examples of group/jointly sparse binary and 
group/jointly sparse Gaussian signals, this section will discuss 
the generalization to arbitrary signals. 


A. Arbitrary Sparse Signals 

BOSSAMP is potentialljH applicable to arbitrarily dis¬ 
tributed signals — the crux is the utilization of the la- 
tent variable z„ j as demonstrated in the Gaussian case, see 


Section III-B. Soft information in terms of L-values is com¬ 


puted and exchanged extrinsically among the group (or joint) 
structure of the signal(s). In the following, we consider the 
group sparse case; the j ointly sparse case just differs in the 
indexing as discussed in ISection IV- aI 


^At this point, we can not guarantee the stability of loopy belief propagation 
and BOSSAMP for all arbitrary prior distributions. 















































C onsid er an arbitrary sparse signal prior (similar to 1(20) 

andli^ 


fu^{Un) = / fx^{v)h{Un- V)dv 

JM. 

= 7„A/'(u„| 0,^) + (1 - 7„)/a 


(52) 


I (tTrt), 


where = f^f aJv)h,(un - v)dv. Our zero prob¬ 

ability estimates (cf. 1(42)1 for Gaussian case) now compute as 


In 


7^ ' 10,/3‘-1) + (1 - li- ' ) {ui- ' ) ’ 

and the innovation L-values are obtained as 


L\=\og^ 


In (1 - 7n) 

The extrinsic group update is performed with these L-values: 


l; = C/g(u‘-i,/3*-i 


> 7 °) ^ L\, 

le0g\n 


l‘ =(7g(U*-\/3‘-\7° 


,b) ^n,b 


E E 


A. Figures of Merit 

The measurement Signal-to-Noise Ratio (SNR) is defined 


fxni't'n') — 7?i^(^?i) ~b (1 'Jn) fantin') ^ (51) 

where /a„(a:„) is the distribution of the nonzero entries in x. 
Remember that random variable u„ is the sum of the two in¬ 
dependent random variables x„ and w„ . T he prior distribution 
of the decoupled measurements (cf. 1(38)1 for Gaussian case) 
is, therefore, obtained via convolution 


as 


SNR = 


lAxll 


Ev 


{IIWII 2 } 


l|Ax||- 

Maf 


(57) 


the noise variance cr^ is set accordingly for each realization 
of A and x to realize a certain SNR during simulation. 

The Normalized Mean Squared Error (NMSE) between 
original signal x and its estimate (recovery) x is defined as 


NMSE = 


X - X 


(58) 


(53) 


(54) 


it gives indication about the overall recovery performance. 
The Ealse Alarm NMSE (EANMSE) is defined as 


EANMSE = 


X-c — X-c 


(59) 


V 5 G {1,...,Ag}. (55) 

Afterwards, prior update 1(37)1 is executed. 

Eo llowing this approach us ing, e.g., the sparse binar y prior 
(20)1 the resulting L-values 1(54)1 coincide with (35)1 Note, 
however, that not every arbitrary distribution will entail good 
recovery performance in the group sparse case; the employed 
loopy belief propagation at the heart of BOSSAMP may 
become unstable and require extensions such as damping, see 

El, HD. 

B. Arbitrarily Structured Signals 

Up to now, we considered either group sparse or jointly 
sparse signals. It is straightforward to extend BOSSAMP to be 
applicable to jointly sparse signals that, individually, exhibit 
a group structure. To that end, the group update has to be 
extended as follows: 


where the complementary signal support contains the 
indices of the zero entries in x. The EANMSE is a measure 
to quantify the strength of the false alarms in x. 

B. Comparative Schemes 

We compare our implementations of AMP, BAMP and 
BOSSAMP (MATLAB code will be made available at ll43l ) 
to the following schemes: 

Group LASSO (GLASSO): in order to incorporate the 
group structure of x, the gr oup L ASSO iffSl - lflTl replaces 
the £i-norm regularization in (1 1)1 with the sum of £ 2 -norms 
of the groups: 


XGLASSo(y; A) = argmm - ||y-Ax]]^ -b | 

I ff=i 




ieQg\n jeB\b 

VpG Ag},V&gS. (56) 

As the \B\ = Nb signals are jointly sparse, all of them exhibit 
the same i ndivid ual group structure, i.e., the same groups 
Qg. Update 1(56)1 accounts for the group as well as the joint 
sparsity. 

VI. Comparison and Numerical Results 
Let us first introduce the figures of merit for comparison. We 
then highlight the schemes to which we compare BOSSAMP 
to. The numerical setup is described, and the simulation results 
are provided, followed by a discussion. Note that for the sake 
of brevity, we only present results for the group sparse case. 


(60) 

In case of \Qr, \ = 1 and Nq = N, it collapses to the standard 

LASSO EB 

We use an implementation via the alternating direction 
method of multipliers that is described in nil and whose 
MATLAB code is available in ll44l . 

A similar approach to solve the group LASSO via the 
alternating direction method was presented in lITSl ; however, 
our simulations have shown that Il44l yields superior results. 
Hybrid Generalized Approx. Message Passing (HGAMP); 
Generalized Approximate Message Passing (GAMP) was 
introduced in HU to extend the classical Gaussian AMP 
framework - on which we build in this paper - to a more 
general setting that allows for arbitrary output chan nels and 
is thus not restricted to additive Gaussian noise in 1(2)1 An 
extension to incorporate structured sparsity was introduced 
in II 20 I and is termed HGAMP. It was shown to outperform 
group orthogonal matching pursu it CD and the group 
LASSO in terms of NMSE We use the MATLAB 
implementation of HGAMP that is provided in 
































C. Numerical Setup 

For AMP, BAMP and BOSSAMP, the stopping criterion 
was set to etoi = 10“'*. The maximum number of iterations 
was set to = 100 , for all algorithms. 

For AMP Algorithm [l], we chose A = 2.678Ar“° *®* 
(NMSE minimizing heuristic for = 1 000, see BhlU . 

For GLASSO, the regularization parameter A is chosen 
according to the example provided in ll4^ . and the augmented 
Lagrangian and over-relaxation parameters were chosen as 
p = I and a = 1, respectively, as suggested in ll44ll . 

For HGAMP with sparse Gaussian signal prior, follow¬ 
ing options were selected (suggested by the toy example in 
lUJl): step=l, removeMean=true, adaptStep=true. 
In case of sparse binary signal prior, following op¬ 
tions were changed in order to mitigate numerical issues: 
step=0.1, removeMean=false. In function estim of 
class GrpSparseEstim.m, the minimum and maximum 
value of the sparse probability rho was set to 10 “*^ and 
1 — 10 “*^, respectively; the same values where chosen for 
the minimum and maxium value of prO — this improved the 
recovery performance, and mitigated numerical issues in the 
binary case. 

For the ’’variable SNR” and ’’variable M” curves, the 
results are averaged over 1 000 random realizations. In each 
realization, sensing matrix A and signal vector x are newly 
generated: A features i.i.d. zero mean Gaussian entries with 
unit f 2 -norm columns, and x has dimension N = 1 000 with 
K = 160 nonzero entries, entailing a zero probability of 
7 „ = 0.84 . The nonzero entries are one in the sparse binary 
(20)1 and i.i.d. Gaussian with zero mean and variance 


case _ 

CTx = 1 in the sparse Gaussian case (2411 We consider 
non-overlapping equally-sized groups in x and compare three 
different cases: 

. Group size l^gl = 2,Vg G {1, 2,..., 500}. With A = 160, 
this implies that we have 80 active groups out of Nq = 
500 total groups. 

• Group size \Qg\ = 5, Vp G {1,..., 200} (32 active groups). 

• Group size \Qg\ = 8, Vp G {1,..., 125} (20 active groups). 
For the ’’empirical phase transition” curves, we consider 

an undersampling (^) vs. sparsity (^) grid, where the 
values range from 0.05 to 0.95 with stepsize 0.05, respectively. 
At each grid point, 200 realizations are simulated. Let us 
introduce a success indicator for each realization r: 

1 NMSE^ < 10“* 

0 else ■ 

The average success is obtained as S' = 
empirical phase transition curves are hnally obtained by plot¬ 
ting the 0.5 contour of S using MATLAB function contour. 

D. Numerical Results and Discussion 
Eor convenience 


Sr = 


Table II highlights how the various 


schemes utilize pr ior i nfor matio n, i.e., the sparsity, the 
Bayesian prior (see (20ll and l(24lll . and the group structure. 
In the following, we call BAMP, BOSSAMP and GAMP the 
Bayesian message passing-based schemes. 



Sparsity 

Bayesian prior 

Group stnacture 

GLASSO 

✓ 

X 

/ 

AMP 

/ 

X 

X 

BAMP 

/ 

✓ 

X 

BOSSAMP 

/ 

✓ 

/ 

HGAMP 

✓ 

✓ 

/ 


TABLE I 

Utilization of prior information. 


Eigure 4l shows the variable SNR results for the sparse 


binary case, where the num ber of measurements was hxed to 
M = 590 (inspired by Ei] with c ft: 2, see ED). BOSSAMP, 
HGAMP and GLASSO depend on the group size that is 
indicated in brackets, while AMP and BAMP do not exploit 
the group structure. AMP and G LA SSO o nly include a sparsity 
constraint steered by A in (1 111 and (60ll respectively. BAMP, 
BOSSAMP and HGA MP exploit the full prior knowledge, i.e., 
they utilize prior l(20l[ Doing so, these schemes exhibit a steep 
transition to the success state (NMSE < —40dB), provided 
that the SNR is sufficiently high. It is evident that larger groups 
strongly improve the results of the schemes that exploit the 
group structure, i.e., BOSSAMP, HGAMP and GLASSO. The 
EANMSE draws a similar picture as the NMSE. It is notable 
that the GLASSO is able to effectively null the false alarms, 
given reasonably sized groups. Once the SNR is large enough 
to ensure successful recovery, the number of iterations of the 
Bayesian message passing-based schemes stays very low. The 
overall best recovery performance is obtained by our proposed 
BOSSAMP algorithm, followed by HGAMP and BAMP. 

Eigure 5l shows the variable M results for the sparse 


binary case at SNR = 25 dB. We observe very steep success 
transitions for the Bayesian message passing-based schemes. 
In particular, BOSSAMP with group size 2 yields successful 
recoveries above 140 measurements, while for group size 8 , it 
only requires slightly more than 20 measurements. In compar¬ 
ison, HGAMP is successful above 200 and 40 measurements, 
respectively, and BAMP requires around 300 measurements. 

Eigure 6l shows the variable SNR results for the sparse 


Gaussian case with M — 590. The message passing-based 
schemes do not exhibit th e same st eep success transitions as in 
the sparse binary case in Eigure 4 but a gradually decreasing 
NMSE over increasing SNR. The performance of BOSSAMP 
and HGAMP is very similar, particularly for large group sizes. 
The EANMSE shows a steeper decrease over SNR, and it is 
again notable that GLASSO features a steep decay, at the ex¬ 
pense of stagnating NMSE (the algorithm utilizes thresholding 
that leads to a sparse solution but lowers the energy in the 
nonzero entries). GLASSO overall requires the least number 
of iterations, closely followed by BOSSAMP. However, the 
Bayesian message passing-based schemes strongly outperform 
AMP and GLASSO in terms of NMSE and EANMSE. 


Eigure tI depicts the variable M results for the sparse 
Gaussian case at SNR = 25 dB. The steep success transitions 
of the Bayesian message passing-based schemes are back, the 
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Fig. 4. Vaiiable SNR in the sparse binary case. 


Fig. 5. Variable M in the sparse binary case. 


NMSE is lower bounded due to finite SNR. It is again evident 
that BOSSAMP features earlier phase transitions (requires 
fewer measurements) than HGAMP, which is particularly 
apparent at small group sizes. The number of iterations behave 
similarly for BOSSAMP, HGAMP and BAMP; a distinctive 
peak accompanies the phase transition event, after which 
the iterations decrease. Due to the same message passing 
foundation, BOSSAMP and HGAMP behave similarly. 


Figure 8l illustrates the empirical phase transition curves 


for the sparse binary case. The standard AMP algorithm 
exhibits the worst performance and is strongly surpassed 
by BAMP that incorporates the Bayesian prior knowledge. 
Additionally exploiting the group structure leads to a supreme 


performance which can be seen in the BOSSAMP phase 
transition curve. Note that the group size in this example was 
chosen really small as \Gg\ =2, yet already results in a big 
improvement. 


Finally, [Figure 9l illustrates the empirical phase transition 
curves for the sparse Gaussian case for various group sizes. 
Clearly, an increase in the group size strongly improves the 
recovery performance of BOSSAMP and HGAMP and leads 
to earlier success transitions. While BOSSAMP and HGAMP 
behave similarly at large values the strongly undersampled 
regime causes problems for HGAMP. 
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Fig. 6. Variable SNR in the sparse Gaussian case. 

VII. Conclusion 

We introduced BOSSAMP, a novel iterative algorithm to 
efficiently recover group sparse or jointly sparse signals. 
The algorithm is based on the AMP framework introduced 
by Donoho, Maleki and Montanari and exploits the known 
signal prior distribution. By introducing an extrinsic group 
update and a prior update step in each iteration, the known 
signal structure is incorporated into the entry-wise MMSE 
estimation of the standard BAMP algorithm; considering a 
specific element of a group, the extrinsic group update step 
collects soft information from the remaining group elements. 
L-values are accumulated according to the turbo principle and 
a belief about whether a specific group element was zero 



Fig. 7. Variable M in the sparse Gaussian case. 


or nonzero arises. According to this belief, the subsequent 
prior update step updates the zero probability of the prior 
distribution that is utilized for the MMSE estimation in BAMP. 

We derived the group update step for the sparse binary re¬ 
spectively the sparse Gaussian case and provided simple closed 
form expressions. Eurthermore, we sketched how BOSSAMP 
is potentially applicable to arbitrary sparse signals. Simulations 
have shown that BOSSAMP outperforms current state of the 
art algorithms, including HGAMP that builds on the same 
message passing foundation. However, HGAMP is based on 
GAMP that - in contrast to the standard AMP framework that 
we utilize in this work - is applicable not only to the additive 
Gaussian noise case but to a more general class of problems. 































































































Undersampling M/N 

Fig. 8. Empirical phase transition curves for sparse binary prior. Recoveries 
are successful (NMSE < 10“^) in the regime below a curve. 


= 1 000 sparse Gaussian 



Undersampling M/N 

Fig. 9. Empirical phase transition curves for sparse Gaussian prior. Recov¬ 
eries are successful (NMSE < 10“^) in the regime below a curve. Results 
are plotted for three different group sizes \Qg\ = {2, 5, 8}. 


including nonlinear output relations. While being more gen¬ 
eral, HGAMP encompasses a more difficult implementation, 
and simulations suggest that due to a series of (additional) 
approximations, it sacrifices some performance in comparison 
to the standard AMP framework E-El. 

Currently, the signal prior distribution is assumed to be 
known. For the sparse Gaussian and the more general Gaussian 
mixture case, the parameters can be estimated using the EM- 
algorithm, whose application in conjunction with GAMP has 
been propagated in izi, isi. Another promising and more 
general approach was proposed in B9l , where Stein’s unbiased 
risk estimator was incorporated into BAMP. 

In conclusion, we have shown that the utilization of the 
(known) signal structure leads to significant improvements — 
on the one hand, fewer measurements are required to obtain 
a certain recovery performance (improved phase transition), 
while on the other hand, the recovery becomes more robust 
with respect to noise if the number of measurements is 
fixed. BOSSAMP is a versatile, easy-to-implement recovery 
algorithm with great performance. 
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